Brain 9

28/05/2023

Github: https://github.sydney.edu.au/yzha6482/BrainBox_3888

import pandas as pd
import os
import pickle
import sys
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score
from sklearn.metrics import mean_squared_error
from tsfresh import extract_features
from tsfresh.utilities.dataframe_functions import impute
from tsfresh.feature_selection import select_features
from tsfresh.utilities.dataframe_functions import roll_time_series
from tsfresh.feature_extraction import EfficientFCParameters
from multiprocessing import Pool
from sklearn.feature_selection import RFE
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import KFold, train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import Lasso
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import GridSearchCV
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

Executive Summary


Our project focuses on developing an effective and efficient method for eye movement classification for the development of a human-computer interface (HCI).Our research and development spans two disciplinary spheres - a Data-Science oriented approach and a Physics based approach - through an integration of these areas, we strived to develop a practical and effective HCI design.

Throughout the project it became apparent that the spiker box was heavily susceptible to artefects, affecting the ability to extract useful information. We found that this was best addressed through specific electrode placements (this will be discussed in greater detail later) that allowed for optimal data extraction and mitigating unwanted noise, and a Gaussian filter that allowed for high frequency undesirable noise to be reduced.

Event detection techniques such as zero-crossing and amplitude thresholding allowed us to efficiently optimise the responsiveness of our product. Our evaluation strategies focused on aspects like accuracy and complexity, using cross validation and related methods. The accuracy of our classifier model was reasonable, corroborated by live testing data. We demonstrated that use of a dual-threshold event detection approach and appropriate classifier model results in minimal misclassifications and few missed events.

However despite our aim in minimising latency, we still fell short of our desired latency time. This was due to suboptimal streaming code, and computationally expensive feature extraction.. Additionally issues like a limited training data set and event splitting meant that accuracy for different users was not as high and our testing accuracy.

Nevertheless, our product goes quite far towards achieving our aim of developing an effective and robust HCI technology, which was only possible via the integration of both disciplines. Future work will aim to improve upon the identified limitations and enhance the model's adaptability and performance.

Aim and Background


Human Computer Interfaces (HCIs) remain at the vanguard of technological advancement, being a focus of substantial scholarly and industrial development. The measurement of bio-signals, such as eye movements and brain activities in electro-oculography (EOG) and electro-encephalography (EEG) based technologies respectively, have been applied in a wide range of contexts. Current applications include wearable EOG glasses which are commercially available from IMEC, (IMEC, 2018) or JINS, and prototypes by researchers such as Attentive U by MIT (Kosmyna et al, 2019). EOGs specifically are cost effective and offer a larger degree of flexibility and applicability compared to other HCI technologies.

Most studies focus on developing assistive technologies for disabled individuals— whether targeted at those with minor motor disabilities, or severe conditions like ALS or MS (Demriel et al). However, the versatility of EOG technology provides scope for broad potential applications, such as controlling video games, (Chang, 2019), hands free-writing (Chang et al, 2017) and even controlling a wheelchair (Kosmyna et al, 2019).

Our product aims to exploit the strengths of the technology whilst mitigating ts weaknesses. EOG signals are susceptible to artefacts which necessitates careful adjustment of the physical setup and data processing to extract useful information. There also exists a trade-off between input diversity, accuracy and low latency on one hand, and a complex and costly physical setup on the other. We strive to overcome these limitations with appropriate physical optimisation strategies, such as careful electrode placements, as well as software-based methods such as Gaussian filtering and event detection mechanisms in our streaming code.

To demonstrate the capabilities of the technology, we have implemented a Tetris game using eye movements and blinks. However, our product functions by producing keyboard outputs, making it adaptable and suitable to different contexts and user needs. Given that the HCI field is inherently interdisciplinary, requiring specialist knowledge in diverse fields such as engineering, computer science and physics (Hartson, 1998), deploying our product necessitated active collaboration and integration of our team’s disciplinary expertise.

Specifically, Data Science members were instrumental in developing classification models, referring to the application of supervised learning algorithms to predict labels assigned to different classes of eye movement. Simultaneously, the process of data collection and product evaluation was facilitated by the substantive knowledge and experimental approach of Physics members, ensuring the product was accurate and reliable. However, the amalgamation of these disciplinary spheres was central to achieving our aim and delivering an effective product.

Method


Figure 1: placement diagram

Electrode Placement


Our product obtains left, right and blink movements using the Backyard Brains Spiker Box to record electrooculography (EOG) data, with electrodes placed as seen in figure 1. Relying on the charge differential between the cornea and retina (which forms an electric dipole), electrode sensors can detect the change in potential when a left or right movement is performed, and for blinks, a combination of EOG and electromyographic (EMG) signals produce an output (PHYS3888 Github; Chang, 2019). Prompted by research (Denney, D., Denney, C., 1984) we suspected that any configurations where the electrodes were horizontally aligned on the user’s face would not register blink signals.

We tested this hypothesis by qualitatively examining each configuration A—E by performing a sample of around 10 left, right and blink eye movements for each configuration and observing the generated signal. From there we selected the best electrode placement by assessing whether lefts and rights could accurately be classified and determining whether blinks were recorded.

Data Collection & Filtration


Data is sent in real time by USB to a computer, where it is parsed by a Python script and saved in a text file. To eliminate noise emerging from the user’s other bodily activity or the power supply frequency, a Gaussian filter is applied by Fourier Transforming the input signal and multiplying it with a Gaussian function. This Gaussian is centred at zero frequency, and its width is controlled by a parameter “sigma_gauss” such that the Full Width at Half Maximum is $$2\ln(2) \cdot \sigma$$ Increasing this parameter broadens the Gaussian function, allowing higher frequencies to pass through unattenuated resulting in more noise, while a low value means that only very low frequencies are accepted, filtering out movement signals as well as noise. To determine the appropriate level of filtration, we varied sigma_gauss around the default value of 25 and qualitatively observed the resultant signals. This default value produced the “smoothest” signal, being unaffected by noise and preserving primary features.

Initial testing indicated that eye movements may last up to 0.6 seconds and blinks approximately 0.4 seconds; hence, we set our model's moving window size to 1 second. In order to create our classifier model, we gathered and filtered 100 examples of each kind of eye movement.

Developed model


Our team developed a predictive model using a combination of machine learning techniques and feature extraction and selection. We leveraged the Python package, tsfresh, to extract features from different eye movements and subsequently, utilised Recursive Feature Elimination (RFE) to rank the features in order of their significance. Based on this score, we selected the top 6 features that were deemed most relevant to our task. Finally, we employed a Decision Tree Classifier (dtree) to build our model and validated it using a 5-fold cross-validation strategy. This decision was guided by considerations of accuracy and computational complexity.

Event Detection


We implemented a zero-crossing and amplitude threshold for event detection. The EOG signal fluctuates about a baseline signal when the user is inactive. Eye movement signals produce a deviation which causes less crossings about this median signal, hence allowing for differentiated windows that contain eye movements. Initially crossings were calculated by comparing each successive data point to the previous, however we optimised the process by employing the NumPy library and reduced the latency to mere milliseconds. The number of zero crossings is then compared to a present threshold value of 8, which we adopted via an iterative, trial-and-error procedure by considering which value maximised genuine detections and prevented registering noise as an event.

However, we found that this method is ineffective when the signal is unstable about the median. Hence, as implemented by Chambayil et al, 2010, we introduced an amplitude threshold which compares the difference between the maximum and minimum output signal in each window. The threshold set was 100 units. Thus, a signal is only classified as an event if both conditions are met, reducing the likelihood of noise being misclassified as motion.

Feature Selection


For feature extraction, we used the tsfresh package which essentially automates the process, eliminating the need to manually devise and compute features. We initially used Comprehensive Features Parameters, though the computational time was significant at around 3 minutes per file. Since low latency was critical, we decided to switch to Efficient Parameters, expecting that this would not compromise the quality of extracted features. In doing so, we obtained faster processing times of approximately 3 seconds per file.

To identify the top few important features, we utilised a Recursive Feature Elimination (RFE) approach. This was conducted to avoid the issue of overfitting, whilst also simplifying and improving overall performance by reducing feature space and improving interpretability. (Jeon, H., & Oh, S. (2020)). We were also conscious of “overlapping features'' between certain attributes, which carried redundant information. Therefore, we removed these features to eliminate potential bias or computational inefficiencies

Model Selection


We experimented with multiple algorithms to select the most optimal model; these included Decision Trees (DT), Support Vector Machines (SVM), K-Nearest Neighbours (KNN), and Random Forest (RF) classifiers. We anticipated that different models would demonstrate performance differently due to the characteristics of our data and the different approach adopted by each model. Bagging was also performed for the DT and RF models to avoid overfitting.

The balance of in-sample and out-of-sample accuracies was the cornerstone of our model selection procedure. In Python, we accomplished this by dividing our data into two sets: training and testing, with 5-fold cross validation. Each model was trained on the training set before being assessed on both the training and testing sets.

In addition to these factors, computational complexity was critical in our decision-making process. It was critical to strike a compromise between computing performance and accuracy as we wanted a model that not only had high accuracy but also processed data quickly, guaranteeing minimal latency.

Accuracy, Robustness and Latency


Determining accuracy in a live streaming condition entails a consideration of various metrics. The first is simply the accuracy of classification, determined by comparing a user’s intended input with the ultimate classifier output. We recorded 80 movements of each type and the corresponding classification in a table, assigning a “1” if correctly classified and “0” otherwise. Another metric obtained is the “miss rate”— any actions which were not detected by the streaming code were noted separately, and the proportion of non-detections to total movements was calculated.

Live testing was completed by only one member, as all training data had previously been exclusively collected from them. However, similar determinations of accuracy were made for other members to ensure overfitting to data from a single user was not taking place.

Latency was simply measured by running the Tetris game and using a stopwatch to determine the time difference between an eye movement and the corresponding action being reflected in-game. The stopwatch was started simultaneously with the user’s movement, and stopped once the piece was moved (adjusting for human reaction time). 30 trials were recorded and an average taken, providing a value for the typical latency. In addition, we relied on the time module to ascertain how long each task / process in our streaming code took to run. This allowed us to identify the primary sources of latency.

Results


with open('extracted_features.pkl', 'rb') as f:
    features_df = pickle.load(f)

    # Save extracted features to a CSV file
    features_df.to_csv('extracted_features.csv', index=False)

    # Fill NaN and Inf values with the mean else ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
    features_df[:] = np.nan_to_num(features_df)

    features_df = features_df.drop('value__query_similarity_count__query_None__threshold_0.0', axis = 1)

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(features_df.drop('label', axis=1), features_df['label'], test_size=0.2, random_state=42)

    # Create a random forest classifier to use for feature selection
    rfe_method = RFE(
        RandomForestClassifier(n_estimators=20, random_state=10),
        n_features_to_select=8,
        step=2,
    )

    rfe_method.fit(X_train, y_train)

    # Print the names of the most important features
    important_features = X_train.columns[(rfe_method.get_support())]
    important_features = [ important_features[0], important_features[1], important_features[3], 
                          important_features[4],important_features[6],important_features[7]]
    important_features = sorted(important_features)
    
    X = features_df.loc[:,important_features].values
    y = features_df['label'].values

    # Split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=60)

    kf = KFold(n_splits=5, shuffle=True, random_state=60)

    accuracy_test = []
    accuracy_train = []

    # Set parameter grid
    param_grid = {
        'max_depth': [2, 3, 4, 5, 6, 7, 8, 9, 10],  # max depth
        'ccp_alpha': [0, 0.01, 0.02, 0.03, 0.04, 0.05, 0.1]  # minimum cost-complexity parameter
    }

    # generate a decision tree classifier
    dtree = DecisionTreeClassifier(random_state=22)

    # Create a grid search object
    grid_search = GridSearchCV(estimator=dtree, param_grid=param_grid, cv=5, scoring='accuracy')

    # Perform grid search on the training set
    grid_search.fit(X_train, y_train)

    # Create a new model using optimal parameters
    dtree = grid_search.best_estimator_
    bagging_dtree = BaggingClassifier(estimator=dtree, n_estimators=10, random_state=0)
    bagging_dtree.fit(X_train, y_train)


    # SVM and KNN accuracies lists
    accuracy_test_svm = []
    accuracy_train_svm = []
    accuracy_test_knn = []
    accuracy_train_knn = []
    accuracy_test_rf = []
    accuracy_train_rf = []

    # SVM and KNN models
    svm = SVC(kernel='linear', random_state=42)
    knn = KNeighborsClassifier(n_neighbors=5)
    # RandomForest model
    rf = RandomForestClassifier(n_estimators=100, random_state=42)
    bagging_rf = BaggingClassifier(estimator=rf, n_estimators=10, random_state=0)
    bagging_rf.fit(X_train, y_train)


    for train_index, test_index in kf.split(X):
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]

        #Train and evaluate DecisionTree
        bagging_dtree.fit(X_train,y_train)
        y_pred = bagging_dtree.predict(X_test)
        # dtree.fit(X_train,y_train)
        # y_pred = dtree.predict(X_test)
        accuracy_test.append(accuracy_score(y_test, y_pred))
        accuracy_train.append(accuracy_score(y_true = y_train, y_pred = bagging_dtree.predict(X_train)))

        # Train and evaluate SVM
        svm.fit(X_train, y_train)
        y_pred_svm = svm.predict(X_test)
        accuracy_test_svm.append(accuracy_score(y_test, y_pred_svm))
        accuracy_train_svm.append(accuracy_score(y_true = y_train, y_pred = svm.predict(X_train)))

        # Train and evaluate KNN
        knn.fit(X_train, y_train)
        y_pred_knn = knn.predict(X_test)
        accuracy_test_knn.append(accuracy_score(y_test, y_pred_knn))
        accuracy_train_knn.append(accuracy_score(y_true = y_train, y_pred = knn.predict(X_train)))

        bagging_rf.fit(X_train, y_train)
        y_pred_bagging_rf = bagging_rf.predict(X_test)
        accuracy_test_rf.append(accuracy_score(y_test, y_pred_bagging_rf))
        accuracy_train_rf.append(accuracy_score(y_true = y_train, y_pred = bagging_rf.predict(X_train)))


    import matplotlib.pyplot as plt
   # assuming accuracy_test, accuracy_test_svm, and accuracy_test_knn are lists containing your accuracy scores
    data_to_plot = [accuracy_test, accuracy_test_svm, accuracy_test_knn, accuracy_test_rf]

    plt.figure(figsize=(7, 6))
    plt.boxplot(data_to_plot)

    plt.title('Accuracy Scores Across 5-Fold Cross Validation')
    plt.xlabel('Classifier')
    plt.ylabel('5 fold cross validation accuracy')
    plt.xticks([1, 2, 3, 4], ['Decision Tree', 'SVM', 'KNN', 'Random Forest'])
    plt.figtext(0.5, 0.01, "Figure 2: Accuracy comparison across different classifiers using 5-Fold Cross Validation.", ha="center", fontsize=12)
    plt.show()
    accuracy_dict = {
    'Model': ['Decision Tree', 'SVM', 'KNN', 'Random Forest'],
    'Train Accuracy': [
        round(np.mean(accuracy_train), 4),
        round(np.mean(accuracy_train_svm), 4),
        round(np.mean(accuracy_train_knn), 4),
        round(np.mean(accuracy_train_rf), 4)
    ],
    'Test Accuracy': [
        round(np.mean(accuracy_test), 4),
        round(np.mean(accuracy_test_svm), 4),
        round(np.mean(accuracy_test_knn), 4),
        round(np.mean(accuracy_test_rf), 4)
    ]
}

    # Convert dictionary to DataFrame
    accuracy_df = pd.DataFrame(accuracy_dict)

    # Convert DataFrame to HTML
    html_table = accuracy_df.to_html(index=False)

    # Add CSS to center the table and add a border
    html_table = html_table.replace('<table', '<table style="border:1px solid black; text-align:center; margin-left:auto; margin-right:auto;"')

    # Add a centered caption at the bottom of the table
    html_table += '<caption style="caption-side: bottom; text-align: center;">Table 1: Accuracy of models</caption>'

    # Display the HTML
    display(HTML('<div style="text-align:center;">' + html_table + '</div>'))
Model Train Accuracy Test Accuracy
Decision Tree 0.9799 0.9721
SVM 0.9737 0.9752
KNN 0.9729 0.9721
Random Forest 0.9923 0.9722
Table 1: Accuracy of models

Feature Selection


We initially selected the top eight features provided by tsfresh, although this produced an accuracy of 1.0 on the training data which indicated overfitting. However, after eliminating overlapping, redundant features, we were left with 6 features, producing a slightly lower accuracy of 0.985 which seems more reasonable. Moreover, included in the appendix are the results of 5 and 10-fold cross-validation. These two steps enhanced the discriminative power of the remaining features and improved overall accuracy.

Model Selection


The decision tree classifier model achieved a training accuracy of 99.85%, indicating effective learning from the training data. However, the test accuracy dropped to 95.98%, suggesting overfitting due to excessive tree depth. To address this, we performed depth evaluation, pruning, and parameter optimization, resulting in a training accuracy of 97.99% and a test accuracy of 97.21%.

The SVM model demonstrated strong generalisation with a training accuracy of 97.37% and a slightly higher test accuracy of 97.52%. The KNN model exhibited consistent performance with a training accuracy of 97.29% and a test accuracy of 97.1%, indicating good generalisation without overfitting.

The random forest model initially showed overfitting, with a training accuracy of 99.23% and a test accuracy of 97.23%. After parameter optimization, the training accuracy improved to 98.14%, and the test accuracy reached 97.52%. Incorporating bagging techniques addressed overfitting and improved generalisation to unseen data.

Considering the nature of the data and classification categories, we chose the decision tree classifier as our final model. It offered interpretability and aided our understanding of the decision-making process, making it suitable for multi-class classification. Additionally, the runner-up model was the random forest model, though this took approximately 0.4 seconds longer for prediction in a 5-fold cross-validation setting and 65 seconds longer for training compared to DT.

Though current literature supports a range of models, such as dynamic positional warping (DPW)(Chang & Im, 2014), support vector machines (SVM) (Chang et al., 2017), and hidden Markov models (HMM) (Fang & Shinozaki, 2018), our priority was balancing precision with interpretability and computational efficiency. A DT model in particular is faster whilst still demonstrating substantial test accuracy, making it suitable for our product.

Electrode Placement


As seen in the Appendix, placements A and B produced clear blink signals while the remaining configurations did not. This was consistent with the literature; the signal obtained from blinks mainly occurs due to an increase in potential above the eyes, as the upper eyelid conducts the positive charge of the cornea (Chang, 2019). When electrodes are horizontally aligned though, this change in potential is equal across the electrodes. Hence, only configurations A or B are appropriate to distinguish between left, right and blink movements.

Position A yields a blink signal which initially spikes “upwards”, because the uppermost electrode is at a higher potential, but in position B, blink signals spike “downwards” because the opposite electrode is at the higher potential. I.E, the signals are effectively mirrored in the y axis.

While we adopted position A, the results indicate that either choice would be sufficient to achieve blink detection but the classifier must be trained on data collected in the same configuration used in live testing, or blinks will be misclassified.

Live Testing


## Live Testing table
livetest = pd.read_csv('livetesting.csv', delimiter=',',engine='python') 
from IPython.display import display, HTML
def display_side_by_side(dfs:list, captions:list):
    output = ""
    combined = dict(zip(captions, dfs))
    styles = [
        dict(selector="caption", props=[("caption-side", "center"), ("font-size", "100%"), ("color", )])]
    for caption, df in combined.items():
        html=f""" <tr>\n       <td >{caption}</td>\n    </tr>\n """
        output += df.set_index('Label').style.set_table_attributes("style='display:inline; font-size:110%' ")._repr_html_().rstrip('   </tr>\n  </tbody>\n</table>')+html+"""   </tr>\n  </tbody>\n</table>"""
        output += "\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0"
    display(HTML(output))
# left accuracy
left = livetest.iloc[40:82, 1:3]
left.reset_index(drop=True, inplace=True)
left.columns = left.iloc[0]
left = left[1:]
left = left.dropna()
left.columns = ['PredictedLabel', 'IsCorrect']
left_accuracy = left['IsCorrect'].mean()

# right accuracy
right = livetest.iloc[40:82,5:7]
right.reset_index(drop=True, inplace=True)
right.columns = right.iloc[0]
right = right[1:]
right = right.dropna()
right.columns = ['PredictedLabel', 'IsCorrect']
right_accuracy = right['IsCorrect'].mean()

# blink accuracy
blink = livetest.iloc[40:82, 9:11]
blink.reset_index(drop=True, inplace=True)
blink.columns = blink.iloc[0]
blink = blink[1:]
blink = blink.dropna()
blink.columns = ['PredictedLabel', 'IsCorrect']
blink_accuracy = blink['IsCorrect'].mean()

# Format the accuracy values as percentages
left_accuracy_str = f'{left_accuracy:.2%}'
right_accuracy_str = f'{right_accuracy:.2%}'
blink_accuracy_str = f'{blink_accuracy:.2%}'

# Create accuracy table
accuracy_table = pd.DataFrame({
    'Label': ['Left', 'Right', 'Blink'],
    'Accuracy': [left_accuracy_str, right_accuracy_str, blink_accuracy_str]
})

# display(accuracy_table)

# test1 left
left1 = livetest.iloc[88:98,1:3]
left1.reset_index(drop=True, inplace=True)
left1.columns = left1.iloc[0]
left1 = left1[1:]
left1 = left1.dropna()
left1.columns = ['PredictedLabel', 'IsCorrect']
left1_accuracy = left1['IsCorrect'].mean()

#test1 right
right1 = livetest.iloc[88:98,5:7]
right1.reset_index(drop=True, inplace=True)
right1.columns = right1.iloc[0]
right1 = right1[1:]
right1 = right1.dropna()
right1.columns = ['PredictedLabel', 'IsCorrect']
right1_accuracy = right1['IsCorrect'].mean()

#test1 blinks
blink1 = livetest.iloc[88:102,9:11]
blink1.reset_index(drop=True, inplace=True)
blink1.columns = blink.iloc[0]
blink1 = blink1[1:]
blink1 = blink1.dropna()
blink1.columns = ['PredictedLabel', 'IsCorrect']
blink1_accuracy = blink1['IsCorrect'].mean()
# Format the accuracy values as percentages
left1_accuracy_str = f'{left1_accuracy:.2%}'
right1_accuracy_str = f'{right1_accuracy:.2%}'
blink1_accuracy_str = f'{blink1_accuracy:.2%}'

# Create accuracy table
accuracy_table1 = pd.DataFrame({
    'Label': ['Left', 'Right', 'Blink'],
    'Accuracy': [left1_accuracy_str, right1_accuracy_str, blink1_accuracy_str]
})

display_side_by_side([accuracy_table,accuracy_table1],['table 2:live testing','table 3:testing on other'])
  Accuracy
Label  
Left 85.00%
Right 80.00%
Blink 80.00%
table 2:live testing
            
  Accuracy
Label  
Left 66.67%
Right 55.56%
Blink 46.15%
table 3:testing on other
            

As demonstrated in the table of results, the accuracy of our product is reasonable considering the literature. Chang, 2019 and Chang, W.-D et al. 2017 suggest that accuracies ranging from 85% to 95% are common in the field. Our product performs best on left eye movements, with an 85% accuracy statistic while right movements and blinks would be misclassified yielding in lower accuracies. Qualitatively, an inspection of the wave files indicates that rights and blinks possess similar features in their relative positions of maxima and minima, the length of the event e.t.c. Ensuring movements were quite fast assisted in increasing accuracy to a degree.

The main issue our product faced was “double classification”— when data in a particular window only includes half of an event. The means the classifier will interpret one movement as two separate events (and will generally misclassify both). This is just an artefact of the streaming code since we had adopted an overlapping window methodology.

Finally, the accuracy of our product on other uses is clearly lower, which is indicative of our classifier overfitting to the training dataset. This is most apparent for blinks where more than half of events were misclassified.

Latency


From live testing, typical latency between a user action and an in-game output was approximately 3.5 s. For a product aiming to be responsive and low latency, this is somewhat high. By timing various processes in the streaming code directly, we determined some primary contributions to this delay. Tasks such as computing zero-crossings and gaussian filtering contributes negligible latency, on the order of 0.001 seconds, due to the usage of DataFrame structures and numpy’s computational capabilities.

However, the process of data transmission from the Arduino to a Python DataFrame takes around 1 second since our streaming code processes data in “batches” / windows. Additionally, the tsfresh feature extraction process contributes between 1.5 and 2 seconds of delay due to the computationally demanding requirements, despite using a minimal number of features. In the Appendix we include a figure from Christ, M, 2018 which illustrates that many features we have used have a relatively high runtime (such as agg_linear_trend and ratio-beyond_r_sigma).

Product illustration


As outlined in data collection, the potential difference generated by eye movements are measured as signals by the SpikerBox which are fed into our classifier model. The classifiers then generate three distinct labels based on the user action as seen in the appendix 3. The python package “pyautogui” is used to output a direct keyboard action corresponding to the input. This occurs whilst our front-end Tetris script is running separately. The codebase for the game was modified to decrease the game speed and increase the height of the board.

Figure 3: deployment process

Discussion


Limitation


The most significant limitation stems from the constrained variety and size of the training data used, as it is vital to have a large and representative dataset to learn patterns reliably and generalise successfully. Hence, resiliency and reliabilty in categorising a broader range of eye movements would have improved with a larger sample set.

Another limitation was that we did not extensively investigate filtering techniques like band-pass and notch filters in our investigation of feature extraction methods and thresholds. More research into these techniques would’ve improve our signal processing pipeline and feature quality. For example, the clarity of eye movement signals would’ve improved by using band-pass filtering to get rid of high-frequency noise and notch filters for power line interference.

Furthermore, existing system may have space to be optimised for responsiveness and reduced latency. Our prefered approach provides high levels of computing efficiency, but further enhancements are possible. For instance, further latency reductions by reevaluating the feature extraction procedure or reducing the codebase.

Our windowing strategy for signal processing and feature extraction also resulted in possibilities of two different eye movements being classified as separate events. Hence, accuracy could be enhanced by refining the overlapping window technique.

Additionally, while decision trees are straightforward and allow for strong interpretability, more sophisticated models, such as neural networks, may be better at capturing the interplay of several factors.

Finally, we acknowledge that the quality of the characteristics used to train our model ultimately determines how well it performs. Although we have made every effort to extract all relevant characteristics, it is still possible that we have overlooked some. More sophisticated feature extraction methods may be used in the future to enhance the model's performance.

Conclusion


In conclusion, we have successfully developed an HCI utilising EOG technology which allows the user to play a Tetris game. Through extensive research and collaboration, we have designed a product which effectively implements machine learning methods to provide decently accurate control of software using the eyes. This technology could be of particular benefit for users with disabilities such as ALS.

We have considered numerous evaluation methods and strived to optimise our product, both in terms of the physical setup and software. However, there exists scope for further development and improvement of the technology. For instance, reducing latency and enhancing the product’s robustness were identified weaknesses of the product, which could be addressed with refined training and streaming methods. Accuracies could also potentially be improved by re-evaluating our event detection strategy. Overall though, our project demonstrates the potential of EOGs in providing accessible access to technology.

Contribution


Jaymes Gourlas

I was the group member who loaned the SpikerBox. Consequently, I was responsible for the collection of all training data which was completed using the streaming code template. Along with various Data members, I was involved in the process of adjusting the streaming code throughout the project, including things like event detection, incorporating our classifier e.t.c. I was the member assigned to development and submission of our weekly meeting slides as well. For the final report, I took charge of the method and results sections which pertained to the Physics aspects of the project, and helped with the aim and background and final proofreading. This included live testing, collecting data & analysing for the results.

Pranjal Pokharel

As a physics student. I played various important roles throughout the project. Towards the initial stages I was a driving force for the use of tetris as the ultimate end product. I also assisted James with some of the data collection. I was responsible for creating the initial tetris front end feature, which was later adjusted by my fellow data data science students. Towards the final end of the project I was responsible for creating the initial presentation draft, focusing on the graphics, structure and some information. I then played an active role in refining this with my team-members. In terms of the report I was responsible for the aim and background section of our report and contributed to the executive summary. I ensured a final proof reading across all the sections, and held an organisational role in ensuring our overall word count was acceptable by adjusting different sections.

Karmen Yang

Within the project, my primary contribution was in the development and implementation of the streaming code. I initially developed a model but the accuracy was not really good. I later helped my group mate on develop our more accurate model. I also contributed to optimizing and debugging the code for efficient processing and collaborated with the front-end team to ensure seamless integration of eye movements into the Tetris game. In report section, I was responsible of integrate our data and writing into our report and enhance the report presentation in html format.

Keenan Yong

As a member of Brain9, I played a crucial role in various aspects of the project, contributing my expertise to both the technical and user-facing components. From conducting thorough data analysis to optimising feature selection and model selection, I focused on improving the performance and accuracy of our algorithms. In addition, I dedicated my efforts to refining the live streaming functionality, troubleshooting any issues and implementing innovative solutions to enhance the overall codebase. Furthermore, I actively participated in the development of the front-end interface for our Tetris game, collaborating closely with the team to design an intuitive and user-friendly experience. By seamlessly integrating the classification models into the front-end, we ensured a seamless and enjoyable gameplay experience for our users. As we approached the final stages of the project, I took on the responsibility of contributing to the feature selection process and summarising our findings in the conclusion section of the report.

Yitong Zhao

I work as the data student lead and finished most of the back-end coding, including streaming (zero crossings, movement detection, data science part of data processing), feature extraction (all stuff), model selection (all stuff except dtree and bagging), and eventually combined physic code and data code. I worked with Jaymes and Karmen to finish the latency analysis; I worked with Keenan to finish the model analysis. I created the basic model and the final model, and wrote the report (data collection, developed model, model selection, latency, limitation, current literature, classifier accuracy) including interdisciplinary content.

References

Appendix


Appendix 1: Model Evaluation and Performance Metrics

with open('extracted_features.pkl', 'rb') as f:
    features_df = pickle.load(f)

features_df.to_csv('extracted_features.csv', index=False)

features_df[:] = np.nan_to_num(features_df)

features_df = features_df.drop('value__query_similarity_count__query_None__threshold_0.0', axis = 1)

with open('dtree_model.pkl', 'rb') as f1:
    model1 = pickle.load(f1)

with open('dtree_model_new.pkl', 'rb') as f2:
    model2 = pickle.load(f2)

rfe_method = RFE(
    RandomForestClassifier(n_estimators=20, random_state=10),
    n_features_to_select=8,
    step=2,
)

features_df[:] = np.nan_to_num(features_df)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features_df.drop('label', axis=1), features_df['label'], test_size=0.2, random_state=42)

rfe_method.fit(X_train, y_train)

important_features_model1 = X_train.columns[(rfe_method.get_support())]


features_df1 = features_df

important_features_model1 = sorted(important_features_model1)

# print(important_features_model1)

X1 = features_df1.loc[:,important_features_model1].values
y1= features_df1['label']
y1 = features_df1['label'].values


important_features_model2 = [ important_features_model1[0], important_features_model1[1], important_features_model1[3], 
                          important_features_model1[4],important_features_model1[6],important_features_model1[7]]
important_features_model2 = sorted(important_features_model2)
features_df2 = features_df

important_features_model2 = sorted(important_features_model2)
X2 = features_df2.loc[:,important_features_model2].values
y2 = features_df2['label']
y2 = features_df2['label'].values

# Split the data into training and testing sets
X_train1, X_test1, y_train1, y_test1 = train_test_split(X1, y1, test_size=0.2, random_state=42)

X_train2, X_test2, y_train2, y_test2 = train_test_split(X2, y2, test_size=0.2, random_state=42)

# Make predictions on the testing set using Model 1
y_pred_model1 = model1.predict(X_test1)
accuracy_model1 = accuracy_score(y_test1, y_pred_model1)

# Make predictions on the testing set using Model 2
y_pred_model2 = model2.predict(X_test2)
accuracy_model2 = accuracy_score(y_test2, y_pred_model2)

print("Accuracy of Model 1 (8 features) :", accuracy_model1)
print("Accuracy of Model 2 (6 features) :", accuracy_model2)

#### 5 fold ####
# Cross-validation with Model 1
cv_scores_model1 = cross_val_score(model1, X1, y1, cv=5)
mean_cv_accuracy_model1 = np.mean(cv_scores_model1)

# Cross-validation with Model 2
cv_scores_model2 = cross_val_score(model2, X2, y2, cv=5)
mean_cv_accuracy_model2 = np.mean(cv_scores_model2)

print("Cross-validated (k = 5) Accuracy of Model 1 (8 features):", mean_cv_accuracy_model1)
print("Cross-validated (k = 5) Accuracy of Model 2 (6 features):", mean_cv_accuracy_model2)

#### 10 fold ####
# Cross-validation with Model 1
cv_scores_model1_10 = cross_val_score(model1, X1, y1, cv=10)
mean_cv_accuracy_model1_10 = np.mean(cv_scores_model1_10)

# Cross-validation with Model 2
cv_scores_model2_10 = cross_val_score(model2, X2, y2, cv=10)
mean_cv_accuracy_model2_10 = np.mean(cv_scores_model2_10)

print("Cross-validated (k = 10) Accuracy of Model 1 (8 features):", mean_cv_accuracy_model1_10)
print("Cross-validated (k = 10) Accuracy of Model 2 (6 features):", mean_cv_accuracy_model2_10)

# Model 1 performance
classification_report_model1 = classification_report(y_test1, y_pred_model1)
print("Classification Report for Model 1:")
print(classification_report_model1)

# Model 2 performance
classification_report_model2 = classification_report(y_test2, y_pred_model2)
print("Classification Report for Model 2:")
print(classification_report_model2)
Accuracy of Model 1 (8 features) : 1.0
Accuracy of Model 2 (6 features) : 0.9846153846153847
Cross-validated (k = 5) Accuracy of Model 1 (8 features): 0.9504807692307693
Cross-validated (k = 5) Accuracy of Model 2 (6 features): 0.9597596153846155
Cross-validated (k = 10) Accuracy of Model 1 (8 features): 0.956534090909091
Cross-validated (k = 10) Accuracy of Model 2 (6 features): 0.9625946969696969
Classification Report for Model 1:
              precision    recall  f1-score   support

           B       1.00      1.00      1.00        20
           L       1.00      1.00      1.00        24
           R       1.00      1.00      1.00        21

    accuracy                           1.00        65
   macro avg       1.00      1.00      1.00        65
weighted avg       1.00      1.00      1.00        65

Classification Report for Model 2:
              precision    recall  f1-score   support

           B       1.00      0.95      0.97        20
           L       1.00      1.00      1.00        24
           R       0.95      1.00      0.98        21

    accuracy                           0.98        65
   macro avg       0.98      0.98      0.98        65
weighted avg       0.99      0.98      0.98        65

Appendix 2: Average runtime for features

Appendix 2: average runtime for features

Appendix 3:

Appendix 3: Product Illustration

Appendix 4:

Appendix 4: EOG signal for blinks in position A (Left) and position B (Right)

Appendix 5:

EOG signal for left movement with sigma_gauss values of 10 (Left) and 50 (Right) respectively. (Right)